Problem 1

Problem 2

a and b c

d and e

Problem 3

lease squares estimates

Problem 4

a)

First estimate
body = read.csv("Bodyfat.csv")

xmat = matrix(0, nrow(body), 5)
xmat[,1] = rep(1, nrow(body))
xmat[,2] = body$Age
xmat[,3] = body$Weight
xmat[,4] = body$Height
xmat[,5] = body$Age + 10*body$Weight + 3*body$Height


lmod1 = lm(bodyfat ~ Age + Weight + Height + I(Age + 10*Weight + 3*Height), data = body)
summary(lmod1)$coefficients
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.7673848 7.47935066  2.375525 1.828479e-02
## Age          0.1697902 0.02956033  5.743853 2.698997e-08
## Weight       0.1981519 0.01312664 15.095402 5.829476e-37
## Height      -0.5943339 0.10690038 -5.559698 6.972685e-08

using lm function to get least squares esitmates BODYFAT \(= 17.7673848 + 0.1697902Age + 0.1981519Weight -0.5943339Height\)

Second estimate

svd

pseudo_inverse = pinv(xmat)
# these are least square estimates from beta0 to beta4
esti = pseudo_inverse %*% body$bodyfat;esti
##              [,1]
## [1,] 17.767384801
## [2,]  0.166472102
## [3,]  0.164971028
## [4,] -0.604288101
## [5,]  0.003318082
Third estimates
#we notice that the matrix X's rank is 4, because the last column is the linear combination of the previous columns.
Rank(xmat)
## [1] 4
#find the vector that spans the nullspace of X
x5 = nullspace(xmat)
#Then this vector is orthogonal to row space of X
#A new least square estimate could be 
esti + x5
##             [,1]
## [1,] 17.76738480
## [2,]  0.07155630
## [3,] -0.78418697
## [4,] -0.88903550
## [5,]  0.09823388

b)

\[\text{x5 is the vector that spans nullspace of X and is orthogonal to row space of X }\\ \beta_1 =\lambda^T\beta= (0,1,0,0,0)\beta \] \[ so\ \lambda= \begin{pmatrix} 0\\ 1\\ 0\\ 0\\ 0 \end{pmatrix} and\ \ \lambda \cdot x5 = -0.0949158,\text{ which is not equal to 0} \] \[\text{This means that}\ \lambda\ \text{is not in the row space of X, so }\beta_1 \text{ is not estimable} \]

c)

\[\text{we assume the model to be }\\BodyFat = \beta_0 +\beta_1 Age +\beta_2Weight+\beta_3Height+\beta_4(Age+10*Weight+3*Height)\\ \text{but we can rewrite it as }\\ BodyFat = \beta_0 +(\beta_1+\beta_4) Age + (\beta_2+10\beta_4)Weight+(\beta_3+3\beta_4)Height\\ \text{From part a, we know these results }\\ \text{So least sqaures estimates }\\\beta_0 = 17.7673848\\ \beta_1+\beta_4 = 0.1697902 \\\beta_2+10\beta_4 = 0.1981519 \\ \beta_3+3\beta_4 = -0.5943339 \]

d)

Yes, we can just read off estimates from part c

summary(lm(bodyfat ~ Age + Weight + Height + I(Age + 10*Weight +
3*Height), data = body))
## 
## Call:
## lm(formula = bodyfat ~ Age + Weight + Height + I(Age + 10 * Weight + 
##     3 * Height), data = body)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.3960  -4.5038  -0.0326   3.8324  15.7154 
## 
## Coefficients: (1 not defined because of singularities)
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       17.76738    7.47935   2.376   0.0183 *  
## Age                                0.16979    0.02956   5.744 2.70e-08 ***
## Weight                             0.19815    0.01313  15.095  < 2e-16 ***
## Height                            -0.59433    0.10690  -5.560 6.97e-08 ***
## I(Age + 10 * Weight + 3 * Height)       NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.809 on 248 degrees of freedom
## Multiple R-squared:  0.524,  Adjusted R-squared:  0.5182 
## F-statistic: 90.99 on 3 and 248 DF,  p-value: < 2.2e-16

We found that if a column is a linear combination of the other columns, lm function ignores that column.

Problem 5